image-text pair
SupplementaryMaterial-WikiDO: ANewBenchmarkEvaluatingCross-ModalRetrieval forVision-LanguageModels
This has been addressed in7 prior work [4, 3] by finetuning VLMs on a given corpus for a given task [5] and8 conducting zero-shot evaluations on a new corpus [7]. However, the mere use of an9 unseen corpus for evaluation does not imply it is OOD. Q1 What do the instances that comprise the dataset represent (e.g., documents, photos,24 people,countries)? Pleaseprovideadescription.26 (a) We provide 384k image-text pairs. Q3 Does the dataset contain all possible instances or is it a sample (not necessarily ran-36 dom) of instances from a larger set? If the dataset is a sample, then what is the larger37 set?
- North America > United States (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Europe > Poland (0.04)
- Europe (0.04)
- South America > Brazil > São Paulo (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- (3 more...)
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs
Despite their demonstrated utility for NLP, multimodal counterfactual examples have been relatively unexplored due to the difficulty of creating paired image-text data with minimal counterfactual changes. To address this challenge, we introduce a scalable framework for automatic generation of counterfactual examples using text-to-image diffusion models.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Law (1.00)
- Government (0.92)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Israel (0.05)
- Europe > Poland (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)